Estimating the Power Spectrum of 
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We develop two methods for estimating the power spectrum, Ci, of the cosmic microwave back- 
ground (CMB) from data and apply them to the COBE/DMR and Saskatoon datasets. One method 
involves a direct evaluation of the likelihood function, and the other is an estimator that is a 
minimum-variance weighted quadratic function of the data. Applied iteratively, the quadratic es- 
timator is not distinct from likelihood analysis, but is rather a rapid means of finding the power 
spectrum that maximizes the likelihood function. Our results bear this out; direct evaluation and 
quadratic estimation converge to the same Ces. The quadratic estimator can also be used to directly 
determine cosmological parameters and their uncertainties. While the two methods both require 
0{N'^) operations, the quadratic is much faster, and both are applicable to datasets with arbitrary 
chopping patterns and noise correlations. We also discuss approximations that may reduce it to 
0{N^) thus making it practical for forthcoming megapixel datasets. 



the likelihood function. 

In Section II we introduce the likelihood function, ex- 
plain our method for evaluating it directly, and derive 
the quadratic estimator. We apply quadratic estimation 
and direct evaluation to the case of COBE/DMR ||^ in 
Section III. Both methods involve iteration and wc find 
that for both, the iteration converges rapidly, with ex- 
cellent agreement between the two methods on the final 
CeS and their variances. However, the higher moments 
of the probability distribution cannot be estimated with 
the quadratic approach — and we find that there are sig- 
nificant deviations from Gaussianity in the likelihood as 
a function of Ce. We discuss these differences, problems 
arising from them and possible solutions. 

For COBE/DMR we estimate every individual Cc (for 
2 < i < 24) since the data allow us to determine these 
with some precision. The quadrupole, C2, has received 
more attention in previous work than any of the other 
moments because of its small value and because it is the 
most susceptible to contamination by emission from our 
galaxy jl^. We also find the quadrupole to be quite 



I. INTRODUCTION 

Observations of the cosmic microwave background 
(CMB) anisotropy are providing strong constraints on 
theories of cosmological structure formation. Planned 
observations have the potential of providing constraints 
on the parameters of these theories at the percent level 

Predictions of theories for CMB anisotropy are statis- 
tical in nature. For many theories, the complete descrip- 
tion is given by the power spectrum, Ci, defined below. 
Thus extraction of Ci from the data is of utmost im- 
portance as an end in itself and for purposes of "radical 
compression" 

With the assumption of the Gaussianity of the data, 
the likelihood function — the probability of the data given 
a particular theory — takes a simple form; with the fur- 
ther assumption of a prior uniform in the parameters, 
the likelihood is proportional to the posterior distribu- 
tion of the parameters, given the data. This is precisely 
the quantity one wants and thus likelihood analysis has 
been used extensively for calculating the constraints on 
parameters given by CMB data. This is true whether 
the parameters are those of the power spectrum itself or 
cosmological parameters. 

Another approach has been to form estimators that 
are quadratic functions of the data, e.g., Q. This proce- 
dure has been improved recently by the use of minimum- 
variance weighting of all the pairs of data points 0,1). 
In this paper we present a unification of the quadratic 
and likelihood approaches. We show that, when used it- 
eratively, the minimum-variance weighted quadratic es- 
timator is a fast technique for finding the maximum of 



small, C2 = 149 ± 126 fiK\ compared to C2 = 810 fiK^ 
for COBE- normalized standard cold dark matter (CDM). 
However, due to the strong skewness of the probabil- 
ity distribution for C2, 25% of the probability is ac- 
tually above the COBE-normalized CDM value of C2. 
Thus consistency with relatively flat models like standard 
CDM does not require the quadrupole power to have been 
reduced by systematic errors. 

For most observations, which only cover a small frac- 
tion of the sky, estimating every Ce is not possible. One 
must be content with estimating the power spectrum ei- 
ther with some binning in £ or through some other param- 
eterization. Therefore in Section IV we discuss binning 
and rebinning. Then in Section V we apply the meth- 
ods to estimate, from the Saskatoon (SK) data ||ll[], the 
power in ten bins from £ = 19 to I — 499. 

Power spectrum estimation can be used as a form of 
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data compression where the estimates of Ci and their co- 
variance matrix are then used to constrain cosmological 
parameters. Because of the great simphfications involved 
in working with power spectrum estimates instead of pix- 
ehzed data, this is currently the only practical procedure 
for using all the CMB data. Such exercises have been 
conducted, e.g., In Section VI we discuss the 

approximations involved in such a procedure and meth- 
ods for reducing the resulting inaccuracies, and in Sec- 
tion VII we apply these results to future balloon- and 
satellite-borne experiments. 

Unfortunately, direct evaluation of the likelihood func- 
tion is an 0{N^) operation, where N is the number 
of data points. And it must be evaluated many times. 
Thus for N ^ 10, 000 this procedure becomes rapidly in- 
tractable on modern workstations — at least for the most 
straightforward implementations. Although the speed 
of likelihood analysis has been greatly increased by use 
of signal-to-noise eigenmode compression |^,^ -18 , this 
procedure still requires an 0{N^) operation to be per- 
formed at least once. 

Further speed is necessary if we are to be able to ana- 
lyze forthcoming megapixel datasets. The quadratic es- 
timator may offer a means of achieving this speed. We 
emphasize that as we have applied it here it is still an 
0{N^) operation, but believe that approximations may 
be made in a controlled manner to reduce it to 0{N'^). 
We discuss these problems and possible solutions in Sec- 
tion VIII, as well as explicitly outline our algorithm for 
power spectrum estimation from CMB data. 



II. METHODS: LIKELIHOOD ANALYSIS 

We begin by establishing the notation used for describ- 
ing the pixelized data of a CMB observation. We also 
define the power spectrum, C'i, and the likelihood func- 
tion. With this common groundwork complete, we then 
move on to a description of the two different methods for 
estimating Ci. 



A. The Likelihood Function 

In general, CMB observations are reduced to a set of 
binned observations of the sky, or pixels, Ai, i = 1 . . . N 
together with a noise covariance matrix, Cnw ■ We model 
the observations as contributions from signal and noise. 



A,; - 



(2.1) 



We assume that the signal and noise are independent 
with zero mean, with correlation matrices given by 



so 



(A, A,,) = Cm' +Cn 



(2.2) 
(2.3) 



where (. . .) indicate an ensemble average. With the fur- 
ther assumption that the data are Gaussian, these two 
point functions are all that is necessary for a complete 
statistical description of the data. 

One important complication to the above description 
comes from the existence of constraints. Often the data, 
Ai, are susceptible to a large source of noise, or a not- 
well-understood source of noise that contaminates only 
one mode of the data. For example, the average value of 
Ai may be very poorly determined. In this case, the 
average is usually subtracted from Aj. Similarly, the 
monopole and dipole are explicitly subtracted from the 
all-sky COBE/DMR data, because the monopole is not 
determined by the data and the dipole is local in origin. 
In general, placing any constraint on the data or some 
subset thereof, such as insisting that its average be zero, 
results in additional correlations in A^ . We take this into 
account by adding these additional correlations, Cc, to 
the noise matrix to create a "generalized noise matrix," 
Cn, where Cn = Cn + Cc- In the limit that the am- 
plitude of Cc gets large, this is equivalent to projecting 
out those modes which are now unconstrained by the 
data [l9| ], but this scheme is numerically much simpler 
to implement. Thus in the text below we always write 
the noise matrix as Cn instead of C„. The details of 
this procedure for handling the effect of constraints are 
explained in Appendix 

Due to finite angular resolution and switching strate- 
gies designed to minimize contributions from spurious 
signals (such as from the atmosphere), the signal is gen- 
erally not simply the temperature of the sky in some di- 
rection, T{x), but a linear combination of temperatures: 



dQ.H{x, Xi)T{x) 



(2.4) 



where H(x,Xi) is sometimes called the "beam map" , "an- 
tenna pattern" or "synthesis vector" . If we discretize the 
temperature on the sky then we can write the beam map 
in matrix form, Si = X]„ Hi„Tn. 

The temperature on the sky, like any scalar field on a 
sphere, can be decomposed into spherical harmonics 



(2.5) 



em 



If the anisotropy is statistically isotropic, i.e., there are 
no special directions in the mean, then the variance of 
the multipole moments, is independent of to and we 
can write: 



CiSfjiSn 



(2.6) 



For theories with statistically isotropic Gaussian initial 
conditions, the angular power spectrum, Cg, is the entire 
statistical content of the theory in the sense that any 
possible predictions of the theory for the temperature of 



2 



the microwave sky can be derived from it Even for 
non-Gaussian theories, the angular power spectrum is a 
very important statistic, probably the most important 
one for determining the viabiUty of the most popular non- 
Gaussian theories. However, the techniques we present in 
this paper for estimating the power spectrum assume that 
the fluctuations in both the sky signal and experimental 
noise are Gaussian. 

The theoretical covariance matrix, C'tw , is related to 
the angular power spectrum by 



2£+l 
An 



CeWu'ii), 



where 



M^^.'W = I]^^nifz'„'P£(cOS0„ 



(2.7) 



(2., 



is called the window function of the observations and 
9nn' is the angular separation between the points on the 
sphere labeled by n and n' . 

Let us define the quantity Ce = £{£ + l)Ce/{2n). This 
is useful for two reasons: it is the logarithmic average 
of Cg that gives the variance of the data and (therefore) 
for scale-invariant theories of structure formation, Ci is 
roughly constant at large scales. 

Within the context of a model, the Ce depend on some 
parameters, Qp, p = 1 . . . Np which could be the Hub- 
ble constant, baryon density, redshift of reionization, etc. 
The theoretical covariance matrix will depend on these 
parameters through its dependence on Ci. We can now 
write down the likelihood function for Op, which is equal 
to the probability of the data given Op. 



C^iap) = P{A\ap) 



(27r)^/2|CT(ap) + Cw|i/2 



exp 



(2.9) 



One can then search for the parameters Up that maximize 
this likelihood. 



B. Direct Evaluation of the Likelihood Function 

First, we must choose a set of parameters to charac- 
terize the theoretical covariance, Ct- For a given class 
of cosmological theories {e.g., adiabatic perturbations 
from inflation) we can calculate the power spectrum from 
some set of parameters like the densities of various com- 
ponents, rix, the shape of the primordial power spec- 
trum, the Hubble constant, etc. A detailed exploration 



* Non-linear evolution will produce non-Gaussianity from 
Gaussian initial conditions but this is quite sub-dominant for 
£ < 1000. 



of the cosmological parameter space constrained by cur- 
rent CMB and large-scale structure data is given in . 
Alternately, we can describe the power spectrum by its 
actual value at some discrete multipoles or bands of £. 
Moreover, all of the information in the experiment (again, 
for Gaussian theories) is captured in the likelihood func- 
tion for the power spectrum: 



PiA\{ap})^P{A\{Cdap)}) 



(2.10) 



In this paper, we concentrate on the Ci parameterization 
in order to determine the power spectrum directly from 
the data. In principle, we would like to calculate the full 
likelihood as a function of the power spectrum P(A|{C£}) 
for some £ < £max; at the very least we would like to 
find the maximum of this £niax-dimensional function, and 
its properties {e.g., curvature or "width") around this 
maximum. 

Searching such multi-dimensional spaces can be diffi- 
cult; in this case, each evaluation of the likelihood func- 
tion is an expensive 0{N^) matrix manipulation and a 
brute force search through the parameter space would 
take of order (C^/JC^)^'"'"' such evaluations to reach an 
accuracy of SCi. In our applications, we have found that 
the space is sufficiently structureless that a simple itera- 
tion procedure works well for finding the maximum. In 
addition, we do not use all of the individual C'i values as 
separate parameters, since experiments do not have un- 
correlated information about bands of width A£ ^ ^ir/d, 
where 9 is the angular extent of the survey |2^. For 
COBE/DMR, we bin in bands of width A£ = 2 - 3 for 
£ > 25 and only consider £ < 35; above this multipole we 
give the power spectrum a constant shape and amplitude 
(that of COBE- normalized standard CDM, in this case). 
For SK, we have tried bins of various widths, the choice 
of which we will discuss below. 

At the first iteration, we choose some appropriate 
starting Cg. For each £ (or band), we hold all other Cgs 
fixed while the one of interest is allowed to vary; in the 
appropriate signal-to-noise basis, the likelihood as a func- 
tion of this single parameter is trivial to compute (see 
Appendix That is, for each band labeled by B, we 
rewrite the correlation matrix as 



Ct + Cn — qbCb + C, 



N* 



(2.11) 



(no sum over B) where the effective signal and noise ma- 
trices are given by 

Ci3..' =^^^C,W^,,.(£); 



2/11 

Cn'u- = Cnw + -^ClWu'{L). (2.12) 



4tt 



and calculate the likelihood as a function of the adjust- 
ment factor qb alone. After going through all the £ bands 
of interest, we then update the starting power spectrum 
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by multiplying the Cis in each band by the qb that max- 
imized the likelihood function. We then repeat. Con- 
vergence is achieved when all the qbS equal unity. For 
COBE/DMR, starting from COBE-normalized standard 
CDM (already a good fit) we achieved convergence at the 
few percent level after only two such iterations for £ < 20; 
after 10 iterations, convergence is everywhere better than 
10-4. 

There is a drawback to the procedure as described 
so far, compared to what could be achieved by more 
ambitious methods such as simulated annealing pl|,p|. 
Even though we find the maximum of the likelihood func- 
tion, we haven't accurately determined its shape — only 
the shape along each Ce while the others are held con- 
stant (i.e., parallel to the axes of the ^max-dimensional 
space) . And we have no estimate for the correlations be- 
tween the uncertainties in each estimate of Ci. Below, 
we shall see how to use the Fisher matrix for an estimate 
of these correlations. Clearly, a more ambitious mini- 
mization strategy would be preferable; we have chosen 
not to implement one since the quadratic estimator to 
be derived below achieves this end without any explicit 
likelihood calculation. 

We have also considered the possibility of estimating 
each Ci assuming no other knowledge of all of the oth- 
ers. That is, we have attempted to marginalize over the 
Ce values outside of each band. This is equivalent to 
the procedure outlined in Appendix ^ for marginalizing 
over removed constraints (averages, dipoles, etc.) and 
foreground templates. However, in this case, the method 
fails to constrain the power spectrum. In performing this 
marginalization, we effectively allow an arbitrary amount 
of noise consistent with any power spectrum at all out- 
side of the band of interest. That is, we multiply the 



second term in Eq. 2.12 by a very large number to make 



the variance in those modes larger than the noise or (ex- 
pected) signal. For a perfect, all-sky observation, this 
would not be a hindrance since all the multipoles are in- 
dependent. For any realistic observation, however, there 
is aliasing of different multipoles together; some modes 
of the data (defined, for example, by the eigenmodes of 
Appendix ^) that are being marginalized over will have 
nonzero contributions from within the £-band of interest. 
Thus, the new noise spectrum alone will span the space 
of possible signals, consistent with having no power at 
all in the band. This just reinforces the idea that any 
unknown noise in the observation should ideally be com- 
pletely "orthogonal" to the quantities we are attempt- 
ing to estimate (which will often be the case when the 
marginalization technique is used for experimental con- 
straints or foreground removal). 



peak. For well-constrained parameters this approxima- 
tion should be good except in the tails of the distribution. 
A Gaussian approximation to the likelihood function can 
be obtained by truncating the Taylor expansion of InC 



about Up at second order in Sap-. 



In £(a + Sa) = In £(«) + ^ 

p 



dliiC{a) 



da 



Sa„ 



■p 



l^d^lnC{a), , 

pp' ^ ^ 



(2.13) 



This Gaussian approximation is useful because now, in- 
stead of making multiple evaluations of the likelihood 
function, we can directly solve for the 5ap that maximize 
it: 



5a„ 



E 

p' 



92 \nC{a) 



dopdapi 



d\nC(a) 



dc 



'■p' 



(2.14) 



The first derivative is given by: 



dlnCja) 
dap' 



1 



Tr [(AA^ - C) {C-^CryC-^)] (2.15) 



and the second derivative by 

^(a) ^ ln£(a) 
dopdap' 

= Tr[ (AA^ - C) {C-^Ct,pC-'^Ct,p'C-^ 



+ -Tt{C-'Ct,pC-'Ct,p') 



- -jC ^Ct,pp'C -^)] 
(2.16) 



where Tr is the trace, C = Ct+Cn is the total covariance 
matrix and ^p = d/dap. We call the second derivative the 
curvature matrix and give it the symbol J-"*^"-* where the 
(a) indicates that we have taken the derivative of \nC 
with respect to a. 

To the extent that the likelihood function is not Gaus- 
sian, we will not have correctly solved for its maximum. 
Thus we iterate. The closer we get to the maximum, the 
better the quadratic approximation to hiC will become. 
This is exactly the Newton-Raphson method for finding 
the zero oi dlnC/dap. The procedure is not fool-proof — 
there is the risk of getting trapped in a local extremum. 
In practice we have found the likelihood function to be 
sufficiently structureless that this is not a problem. 



D. Quadratic Estimator 



C. Gaussian Approximation to the Likelihood 
Function 

If the likelihood function is continous and has a peak 
then it can be approximated as a Gaussian near the 



The above procedure is not exactly what we do in prac- 
tice. Calculating the curvature matrix is a computation- 
ally intensive procedure. Matters simplify significantly 
if we settle for the ensemble average quantity, called the 
Fisher matrix, F: 
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(a) 



(2.17) 



When taking this ensemble average, denoted by (...), 
we assume that the theory is correct and therefore that 
(AA^) = C. 

Note that the Fisher matrix, hkc the curvature matrix, 
is defined with respect to particular parameter choices. 



If we transform to a new set of parameters, 



then 



the Fisher matrix for these new parameters is -F^"^ 



l\T 



a proof of this 
the definition of the curvature matrix in Eq. 2.16 



where Zppi = dhp/dapi. Tegmark offers 
with our approach it is obvious from 



Replacing the curvature matrix with the Fisher matrix 
makes our estimator for ap quadratic in the data. A: 



2 

p' 



(2.18) 



This is what we call the quadratic estimator. The right 
hand-side depends on a^, so we pick an initial a^, cal- 
culate the correction 5ap, and then repeat for the new 
value of ap. Note that the power spectrum estimate is 
not constrained to be positive-definite — a point we dis- 
cuss below. 

If we assume that the input theory is correct, then 
(AA"^) = C and therefore Eq. imphes {Sap) = 0. 
Similarly, one can work out that (SapSapi) = iF'"°'^)ppi- 
This is to be expected since for a Gaussian distribution, 
the two-point function is the inverse of the curvature ma- 
trix. 

Although the quadratic involves using the Fisher ma- 
trix F as an approximation to the full curvature matrix 
J-', both procedures iterate to the same parameters, the 
maximum of the likelihood function. This is because both 
J- and F are invertible, so Sap = from either procedure 
implies dhiC/dap = 0. Thus, when applied iteratively, 
the quadratic estimator will find the exact location of the 
likelihood peak; the only approximation comes in using 
the Fisher matrix to approximate the errors, rather than 
the full curvature matrix (and below we show that in the 
cases studied, this is a very good approximation; more- 
over, having found the location of the peak, the curvature 
there can be calculated explicitly if necessary). 

Our procedure is very similar to that of the Levenberg- 
Marquardt method [ pl| for minimizing a with non- 
linear parameter dependence. There the curvature ma- 
trix (second derivative of the x^) is replaced by its ex- 
pectation value and then scaled according to whether 
the is reduced or increased from the previous itera- 
tion. Similar manipulations of the Fisher matrix may 
possibly speed convergence of the likelihood maximiza- 
tion, although one would want to do this without direct 
evaluation of the likelihood function. 



In our applications to COBE/DMR and SK we have 
found that iteration converges quickly. Iteration is espe- 
cially important for the calculation of the error covari- 
ance matrix. Without iteration, the errors are deter- 
mined entirely by the initial theoretical assumptions and 
are not influenced by the data. (Of course, this is exactly 
why the Fisher matrix has been so useful in determining 
how well future observations will be able to determine 
parameters.) 

As we have defined it so far, the quadratic estimator 
with the iteration procedure is a method for finding the 
maximum of the likelihood. Only if one takes the prior 
probability to be uniform in the parameters is this equiv- 
alent to maximizing the posterior probability. We could, 
of course, include different priors directly in the defini- 
tion of the est imat or. The derivation would then begin by 
changing Eq. |t| to a Taylor expansion of In Ppost where 



Fpost oc >CPprior is the posterior probability distribution 
and Pprior is the (differentiable) prior distribution. 

To see how the quadratic estimator works, we can take 
a one-dimensional example. Consider a function /, that 
is approximately quadratic. If we take its first and second 
derivatives about some point, xq (= 0.7 in the figure), we 
can construct the function fg which approximates /. By 
finding the value of x that maximizes fg we have a guess 
as to the maximum of /. Now, for a further refinement 
of the estimate, a new /q can be calculated based upon 
the properties of / at this new value of x. (Note that 
the full quadratic estimator of Eq. 2.18 includes the fur- 



ther approximation of using the Fisher matrix (Eq. 2.17) 
rather than the actual curvature matrix (Eq. 2.16| ) for 
the second derivative of the log-likelihood.) 

Quadratic Estimation 




FIG. 1. A one-dimensional example of quadratic estimation. 

The applications we discuss in the following all use the 
CiS as the parameters Op. In this case. 
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We also consider the power spectrum averaged over some 



bands B with some assumed shape C\ 



shape _ 



in that case, 



we average the above weighted by the shape: 



Ctv 



Cth'iC', 



shape 



(2.20) 



However, there is also the interesting possibility of tak- 
ing the dp clS the cosmological parameters that affect the 
spectrum, J7, h, ns, f^b, etc. Iteration in this case should 
also converge to the likelihood maximum. 

We note that the quadratic estimator discussed here 
can also be derived by finding the quadratic function of 
the data that is unbiased and has minimum variance. 
For a full discussion of the quadratic in this context, see 
|f7|, p2|j2^ . The quadra tic fu nction of the data derived this 
way is the same as Eq. 2.18 . However, the estimate is only 
unbiased if there is no iteration. Since the end point of 
(successful) iteration is the maximum likelihood, the iter- 
ated estimator is, like all maximum likelihood estimators, 
only asymptotically unbiased. 

The methods we have used can also be applied to opti- 
mal determination of the correlation function in angular 
bins. The optimal signal plus noise weighting suggested 
for correlation function determination differs from the 
usual diag[C-i] weighting applied to COBE/DMR. 



E. Single Bandpower Estimation 

It has now become conventional to characterize switch- 
ing experiments which covered small patches of the sky 
by a single bandpower |15|, placing the estimated power 
at a location related to the window function of the ex- 
periment. In this case, there is just one parameter to 
determine. The quadratic statistic reduces to 



Qi 



TtCtC-^CtC-^ 



(2.21) 



If the optimal weight is replaced by the diagonal part 
of C^^, then this is related to the quadratic statistic pro- 
posed by Boughn and Cottingham j2^], which has been 
applied to the COBE/DMR and FIRS data using Monte 
Carlo simulations to define its distribution. With the op- 
timal weighting and the proper inclusion of constraints 
in Cat, the values of Qb and its error estimation are of 
direct use. As discussed above, the iterated quadratic 
estimator for the amplitude will converge to the maxi- 
mum likelihood value. The parameter Qb could be any 
squared amplitude characterizing the assumed theoreti- 
cal Cf, such as the cr| used to characterize the strength 
of the power spectrum on cluster scales. To translate to 
an average bandpower one must evaluate Q b {C^^'^^'^ e) g , 
using an appropriately weighted average of C^^'^^^g over 
the single band B. Issues associated with such averaging 
are addressed in 



IV 



Current and future experiments 
cover large enough patches of the sky that characteriz- 
ing their results by single bandpowers is not useful, but 



evaluation of power spectrum normalization amplitudes 
(such as (Tg) for particular theories will always be of use. 



III. APPLICATION TO COBE/DMR 

We first apply these methods to the anisotropy mea- 
surements of the COBE/DMR instrument jUll. The 
DMR instrument actually measured a complicated set 
of temperature differences 60° apart on the sky, but the 
data were reported in the much simpler form of a tem- 
perature map, along with appropriate errors (which we 
have expanded to take into account correlations gener- 
ated by the differencing strategy, as treated in [Q, fol- 
lowing [^). The calculation of the theoretical correla- 
tion matrix includes the effects of the beam, digitization 
of the time stream, and an isotropized treatment of pix- 
elization, using the table given by Kneissl and Smoot , 
modified for resolution 5. We use a weighted combina- 
tion of the 31, 53 and 90 GHz maps. Because most of the 
information in the data is at large angular scales, we use 
the maps degraded to "resolution 5" which has 1536 pix- 
els. Further, we cannot of course observe the entire CMB 
sky; we use the most recent galactic cut suggested by the 
DMR team Q, leaving us with 999 pixels to analyze. We 
use the galactic, as opposed to ecliptic, pixelization. 

For both methods we iterated 28 parameters: C2 to 
C24 individually, C25 to C32 grouped into bins of width 2 
and finally C33 through C35 grouped into one bin. Bin- 
ning is described in more detail prior to the Saskatoon 
application where it is much more important. 
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multipole moment I 

FIG. 2. Maximum-likelihood power spectra from iterative 
direct evaluation of the likelihood function. The curve is 
the zeroth iteration: COBE-normalized standard CDM. The 
points with error bars are, from left to right, the results of 
the first to third iterations. Here, we define the error bars by 
a likelihood ratio of e~^^'^ from the peak. 
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FIG. 3. Iterative quadratic estimation. The curve is the ze- 
roth iteration: COBE-normalized standard CDM. The points 
with error bars are, from left to right, the results of the first 
to third iterations. 
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FIG. 4. We compare the results of the quadratic and direct 
evaluation iteration schemes. At each I, the left error bar 
(square symbol) is for the quadratic, the right (triangle) is 
for the direct evaluation. 

In Figures ^ and ^ we see the results of the iterative 
procedures described in the previous section. Figure ^ 
shows the results of direct evaluation and Fig. || shows 
the results of quadratic estimation. Moments > 10 are 
not shown to avoid clutter. From left to right are the 
first to third iterations, together with their error bars. 
The solid line is the starting point we chose, the power 
spectrum for COBE-normalized standard CDM. For this 
method, we define the estimated Ci as the maximum of 
the likelihood function, and the errors by the value of Ci 
where the likelihood drops by a factor e"""^/^ from that 
maximum. 

First we will discuss the direct evaluation method. The 



iteration converges rapidly. The maximum likelihood val- 
ues of a fourth iteration (shown in Fig. ||) typically differ 
from the third by 1-3% of the error bars (for 2 < £ < 19) 
with a maximum deviation of 7% at i = 12. In the limit 
that the moments were independent, there would be no 
need for iteration; iteration is only necessary because of 
the influence the value of one band has on the best value 
of another. The rapidity of the convergence is expected 
because, as we will see below, the moments are in fact 
fairly uncorrelated. We remind the reader that the error 
bars given by this method — indeed the whole probability 
distribution for each — are calculated by holding the 
others fixed. 

Iteration is also quite rapid for the quadratic estima- 
tor: the maximum likelihood values of a fourth iteration 
(shown in Fig. ^ differ from the third by better than 
1% of the square root of the variance for £ < 24, except 
for the quadrupole and £ = 20 which are slightly worse, 
converging to 3%. Just like the direct method, most of 
the change in the maximum likelihood estimate occurs in 
the first iteration. 

Unlike the direct method, the error bars of the first it- 
eration are quite different from the error bars of the later 
iterations. That is because the error bars (the Fisher ma- 
trix) do not depend on the data, but only on the input 
power spectrum. Therefore the data have had no effect 
on the error bars until the second iteration is reached. 
To the extent that the distribution is Gaussian, these 
error bars accurately represent the uncertainty on each 
parameter; they take into account the correlations with 
the other parameters. The largest changes in the error 
bars from 1st to 2nd, 2nd to 3rd and 3rd to 4th are 610% 
{£ = 2), 60% {£ = 2) and 6.5% {£ = 6), respectively. 
From the 3rd to the 4th, most of the changes are less 
than 1%. 

In the previous section it was claimed that the curva- 
ture matrix is a good approximation to the Fisher matrix. 
We have explicitly checked this for the final iteration and 
find that for £ < 20 most of the Fisher matrix and curva- 
ture matrix derived error bars agree with each other to 
better than 4%. The worst cases are £ = 4 and £ — 5 a,t 
13% and 15%. 

Not only do these methods converge, but they con- 
verge to the same power spectrum, as we see in Fig. |^. 
The differences between the final iterations are less than 
2% of the quadratic error bars for £ < 20, except for a 
4% difference dX £ = 18; at higher moments, the methods 
often do not detect positive power. Note that at multi- 
poles where both methods do detect nonzero power, the 
quadratic method gives error bars which are systemat- 
ically smaller (than those of the direct method) in the 
direction of positive power, and systematically larger to- 
wards lower power. This can be understood as a result 
of the considerable non-Gaussian skewness of the distri- 
bution of power, as seen in Fig. |6[ Also note that when 
the likelihood maximum is at zero power, the quadratic 
estimate is at (physically meaningless) negative power. 
This is to be expected since the existence of a maximum 
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at = implies dlnC/dCi < 0, and therefore the Gaus- 
sian fit to ln£ at = will peak at < 0. 

We have also checked that using the full resolution 6 
data (3881 pixels after the galactic cut) changes the re- 
sults of the maximum-likelihood estimate for the power 
spectrum by much less than one sigma. We have checked 
in detail using the direct evaluation, for which the reso- 
lution 6 results differ from those at resolution 5 by less 
than 5% for £ < 15, except ai I = 6-9 where the differ- 
ence is almost 10-20% and at ^ = 12 and £ = 14, where 
the difference is nearly 50%, still smaller than the large 
error at these £; the higher resolution data give an overall 
normalization that differs by 4% (compared with an error 
of 14%) from that of the best quadratic computed at res- 
olution 5. These differences are consistent with those ob- 
served for different pixelizations and galactic cuts P,p5t; 
note that both the direct evaluation and quadratic pro- 
cedures converge with considerably higher precision than 
these intrinsic errors, even for £ > 15 where the pixeliza- 
tion differences become important and, simultaneuously, 
the noise begins to dominate. 

We also agree at least qualitatively with other calcu- 
lations that we have compared to, in all cases (with de- 
tected power) well within the various reported error bars. 
In Fig. I we show a comparison of our quadratic results 
with those of |^,^ , both of whom use a maximum likeli- 
hood method. Gorski uses a complete search through 
parameter space with "cut-sky spherical harmonics" to 
speed up the calculation; Bunn & White also use 
the Signal-to-Noise transformation of Appendix ^ to in- 
crease speed. The results of our first quadratic iteration 
also have qualitative agreement with Tegmark's imple- 
mentation of the quadratic estimator . 



* This work (quadraLic) 
o Bann/White 

■ Goi'ski (Galactic) 

* Gorski (Ecliptic) 
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the final estimates are unaffected by the choice of initial 
starting place, and the stronger claim that they would 
have resulted from any starting place. From the Fisher 
matrix and from the probability distributions of Fig. ^ it 
should be evident that this likelihood space is fairly struc- 
tureless. We could have started anywhere and converged 
to the same place, although perhaps slightly less rapidly. 
We note though that if the correlations were stronger be- 
tween the different Ces, the direct method would be less 
robust. In particular, if the initial power spectrum were 
much too large, then each multipole moment would try 
to make up for this all by itself by coming out very small. 
Thus there could be large oscillations — conceivably with- 
out convergence. In addition, these correlations, com- 
bined with the width of the likelihood function, imply 
that our iterative direct evaluation method for finding the 
peak may not converge to a unique maximum, as values 
oscillate between iterations; in practice, we have found 
that the changes remain much smaller than the size of 
the error bars, as noted above. Such a broad likelihood 
function indicates that the data do not strongly prefer 
a unique maximum. Nonetheless, if we desire to find 
the exact location of the peak, a more complete search 
through the many-parameter space (as in [ pT[[25| ] ) or the 
use of the quadratic method will be necessary. 

The probability distributions of the parameters are 
different for the two different methods because of the 
approximation of independence by the direct method 
and the approximation of Gaussianity by the quadratic 
method. We can see those differences in Fig. |[ The 
departure from Gaussianity is most dramatic for the 
quadrupole. According to the Gaussian distribution of 
C2, COBE-normalized CDM with C2 = 770 /iK^ is over 
five standard deviations away from the mean, highly 
ruled out. But the strong skewness of the exact likehhood 
function has 25% of the probability for C2 above 770 /xK^. 
This is more probability than there is above only la for a 
Gaussian distribution!^] As £ increases the distributions 
become more Gaussian. The distribution for £ = 21 is 
well-approximated by a Gaussian as expected from the 
central limit theorem since there are approximately 30 
independent modes of roughly equal weight contributing 
to the constraint. 



^Also these quadrupole probability distributions do not take 
into account the possibility of foreground contamination. The 
DMR team |^ have carefully analyzed the foreground con- 
tamination and report C2 = (273 ± 185 ± 360) ijK^ with sta- 
tistical and systematic errors. 



FIG. 5. Comparison of different groups' power spectrum 
estimates, as marked. Gorski computes power spectra in both 
ecliptic and galactic pixelizations of the sky. 



The fact that three completely different methods 
achieve similar results lends support to the claim that 
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FIG. 6. Probability Distributions for individual Ce values, 
as labeled, for a prior uniform in Ci. The solid curve is the 
true likelihood from the last iteration of the full evaluation; 
the dotted curve is the Gaussian approximation from the last 
iteration of the quadratic procedure. For £ — 2, we also show 
the cumulative probability distribution, properly normalized 
to unit probability as — * oo. 

The highly non-Gaussian nature of some of these dis- 
tributions implies that other definitions of the point es- 
timation and the error bars are possible. First, we could 
consider the mean or median of the distribution, rather 
than its maximum, and define errors by the amount of 
enclosed probability. Second, we could also have used 
different prior probabilities for the Ci. Throughout the 
paper, we use a prior uniform in Ci, equivalent to equat- 
ing the posterior distribution with the likelihood itself. 
When the data constrain the power strongly (i.e., small 
error bars), the result is insensitive to the choice of the 
prior; in other regimes, such as the quadrupole, C2, the 
prior has more significance. To investigate this, we have 
also tried other possible prior distributions, along with 
the definition of the point estimate by the median of 
the distribution. A prior P{Ci)dCi oc dCi/^/Ci (which 
is equivalent to a prior uniform in ejth — (C^)^^^) gives 
a median C2 60% higher than the likelihood maximum; 
the highly skewed distribution means that for a constant 
prior the median is 166% higher, while a prior uniform 
in lnC2 has a median only 5% higher. Finally, we have 
also tried a "Fisher Prior ," which uses the element of the 
Fisher matrix (Eq. ^.17 ) corresponding to Up — dth to 
determine the expected amplitude, 

1/2 



P{a. 



Tr 



d\n{CN + atifiT) 



(3.1) 



which is uniform in Ci oc cr^jj at low amplitudes, but uni- 
form in hiCi at high amplitudes, where the smooth tran- 
sition is determined by the scale at which signal-to-noise 
becomes about one. For this prior, the median is about 
20% higher than the maximum likelihood. 
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FIG. 7. Rows of the normalized DMR Fisher matrix (see 
text), at ^ = 2, 10, 21. The solid lines show the matrix at the 
zeroth iteration; the dashed lines for the final iteration. 

In F ig. ^ we show the normalized Fisher matrix, 

-^iP I \l -^eP-^i^e' ^'^ indicate the level of correlations be- 
tween the different CiS. The off-diagonal terms are due 
to the inhomogeneous coverage, the most drastic compo- 
nent of which is due to the galactic cut. This cut discards 
all map pixels with galactic latitude |6| < 20°, with some 
modifications motivated by the DIRBE dust map A 
map with a. \b\ < 20° cut and otherwise homogeneous 
coverage would result in zero overlap between Y^mS with 
opposite parity which explains the near zero values of 
the Fisher matrix for £' — £ odd ||l^ . Modes with sim- 
ilar parity do mix and hence the non-zero elements at 
£' — £ zL 2. Even these off-diagonal terms though are 
much smaller than the diagonal, especially for the lower 
multipole moments which are determined by modes with 
higher signal-to-noise. Iteration does not have much ef- 
fect on the normalized Fisher matrix; the off-diagonal 
components are largely a result of the coverage geome- 
try. 



IV. METHODS: BINNING AND REBINNING 

For the same reason that limited extent in the time 
domain leads to limited spectral resolution in the fre- 
quency domain, uncertainties in Ci and Ci' are strongly 
correlated when £ < 27r/6> where 9 is the linear extent 
of the observed region pQ] . Thus binning moments to- 
gether in bins of width A£ ~ 7r/6' is a sensible thing to 
do. Because of the experimental noise, final bins may 
need to be even coarser to prevent the error bars from 
being excessively large. 

We view binning as a two-step procedure: an initial 
fine binning followed by a rebinning to coarser bins. The 
reason for the first step is that we want to know, within 
each coarser bin, where the constraining information is. 
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The finer binning gives us this knowledge. For pedagog- 
ical reasons, we start with a discussion of rebinning and 
then discuss the initial binning. 



A. Rebinning 

We assume here that the initial binning is the finest 
possible, A£ = 1, since this makes for the simplest expo- 
sition. It is easily generalized to arbitrary initial binning. 
For reasons that will become clear later, we begin our 
discussion of this rebinning procedure by reparameteriz- 
ing the power spectrum in terms of an assumed spectral 
shape, Cf^^^'^. Thus the parameters we are trying to es- 
timate are no longer Ce directly, but the deviation from 
the assumed shape, given by qt. 



Ci = qiCl 



shape 



(4.1) 



If our estimates of individual qi are too noisy then we 
can average them together into coarser bins, which we 
will label by the subscript B. We wish to do this in a 
"minimum variance" manner. That is, we want to find 
Qb that minimizes 



X 



'F^P (QB-qt) 



where the sum (like all sums in this subsection) extends 
over the width of the new and coarser bin. The Fisher 
matrix appears here because, in the Gaussian approxi- 
mation to the likelihood function, the Fisher matrix is 
the inverse of the parameter covariance matrix. Compli- 
cations due to non-Gaussianity are discussed in 



VI 



It is easy to show that the solution to this minimization 
problem is given by 

(9) 



Qb = (4.3) 



The new parameters Q b have the Fisher matrix, F^g, — 

'^w Ffj} where the sum over t extends across bin B 
and the sum over £' extends across bin B' . We see that 
Qb averages qi over the filter f^j = F^j) . The ^q) 
superscript indicates that this filter is for averaging q^s. 

As the constraints on the power spectrum become 
tighter, it is inevitable that we will move from plotting 
averages of Cf (band-powers) to plotting q^ in what we 
call g-space, or d evia tion space. We show some examples 
of this later in § VII where we simulate future data sets. 



Therefore it is worth exploring this space a little further. 
One question to answer is: what i value should be used 
for locating Qb horizontally on a graph? We advocate 
choosing this ^eff so that for a band ranging from to 

4 



E ^bI — y^. fj 



(9) 

Be ■ 



(4.4) 



With this definition, 50% of the weight that constrains 
qB comes from £i < £ < i^s and the other 50% comes 
from Icff < i < h- 

Although comparison of theories with the data will oc- 
cur in space, we wish to translate our values into the 
familiar C^-space. To do this we must define a suitable 
average of cf^^^"^ over bin B, C^g^^°, with which to mul- 
tiply Q B and a suitable i value at which to plot the error 
bar, ics- The best weighting to use for this is debat- 
able. We emphasize that the ambiguities associated with 
the translation from Qb to a power estimate, Cb only 
affect plotting — not the comparison of theory with data. 
Furthermore, we have tried several different weighting 
schemes and found negligible differences in their values of 
£cS and Cb, so long as they are proportional to f^J which 
encodes the signal-to-noise information in the band. 

To motivate a particular averaging we first rewrite 
Eq. 4.3 in terms of Ce and its Fisher matrix: 



(C)^shapc 



shape j-i(C)^shapc 



(4.5) 



The relation between Qb and in the above equation 
suggests that the following filter be used to calculate 



(4.2) C 



shape 
B ■ 



f (C) _ p(C)^shape _ j.(q) ;^shape 

J Be ^ z_^^u'^e' —JBel^e 



since this is the weighting of each in Eq. 
to make our power estimates we use 



(4.6) 



Therefore 



^shapc 



with the result that 



E/'{C)^shape 
e J Be '^e 
AC) 
l^eJ Be 



n nt /^shapo 



(C) 



Ce 



AC) 
l^e Je 



(4.7) 



(4.8) 



I Be 



The role of the filter function, fg^ is exactly that of We/£ 
in the band-power procedure of where We is the 
trace of the window function matrix defined in Eq. 
We will develop this connection more later. For now, we 
define £cs, £'^ and £~, exactly as was done in [|l^, so that 
we can plot data points properly located in £ space with 
horizontal error bars: 



toft 



L^e Be 
AC) 
l^eJ Be 



and £^ and £'^ are where Ifg) has fallen to e^^/^ of its 
maximum value. We remind the reader that every sum 
over £ in this section is only over the values of £ within 
band B. 



(4.9) 



e=t^a 
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B. Initial Binning 

One may wish to estimate fewer parameters than every 
muhipole moment right from the beginning. In this case 
one would parameterize the spectrum as 



(4.10) 



where xb{() is one when £ is within the range of band 
B, and zero otherwise. 

To convert qb to a power estimate, Cb, we need an av- 
erage of the shaped spectrum over band B. A useful con- 
version factor is given by Eq. 4/7. Of course, in order to 
calculate by Eq. |j one needs to know the Fisher 

matrix at every i — which is a calculation we're trying to 
avoid by using coarse binning. Once again though, as 
long as the binning is not too coarse, the details of the 
averaging are unimportant. If the binning is fine enough, 
then a simple average (uniform in i) will suffice — that is, 
take 



^shapc 



(4.11) 



here, the denominator is simply the width of the bin. 
This is what we have done in our applications (although 
see § VI for how this can be improved by use of analytic 
knowledge of the Fisher matrix). 

As is usually the case with binning, we want to make 
the bins as fine as necessary to capture all the informa- 
tion but no finer since that means extra work. A lower 
limit to the bin sizes comes from the fact that fluctua- 
tion power from Ci will be indistinguishable from that 
from if 1^ — ^ I ^ 2tt/6, where 9 is the linear extent 
of the observed region, as already mentioned. We may 
wish to make our initial bins even coarser. Some con- 
siderations to keep in mind are that if one is trying to 
reduce sensitivity to uncertainty in the power-law index 
then logarithmic spacing produces equal shape sensitiv- 
ity in each bin. If the chief shape uncertainty comes from 
features with a characteristic wavelength, e.g., Doppler 
peaks, then a linear spacing produces equal shape sensi- 
tivity in each bin. 



V. APPLICATION TO SASKATOON 

We now apply our methods to the Saskatoon (SK) 
dataset |l^]. The SK data are reported as complicated 
chopping patterns (i.e., beam patterns, H, above) in a 
circle of radius about 8° around the North Celestial Pole. 
The data were taken over 1993-1995 (although we only 
use the 1994-1995 data) at an angular resolution of 1.0- 
0.5° FWHM at approximately 30 GHz and 40 GHz. More 
details can be found in jll] . The combination of the beam 
size, chopping pattern, and sky coverage mean that SK 
is sensitive to the power spectrum over the range i = 50- 
350. The Saskatoon dataset is calibrated by observations 



of supernova remnant, Cassiopeia- A. Leitch and collabo- 
rators 1 28 1 have recently measured the fiux and find that 
the remnant is 5% brighter than the previous best deter- 
mination. We have adjusted the Saskatoon data accord- 
ingly. 

In Fig. 1^ we show the results of our iterated quadratic 
estimator on the SK data, in ten evenly spaced bins 
from £ = 19 to £ = 499. Again, the convergence pro- 
ceeds quite rapidly, although not quite as rapidly as for 
COBE/DMR. Evaluation of the Fisher matrix shows that 
there are approximately 20% anti-correlations between 
neighboring bins. We note in passing that the falling 
power spectrum seen for £ ^ 100 has been noticed by the 
experimenters themselves p9| . 

Wh at we directly estimate is the adjustment factor qb 
of Eq. 4.1C| . As mentioned above, in order to convert this 
to a power spectrum amplitude we need some measure of 
the average power in the bin. Here we have used an aver- 
age uniform in Ci across the bin (Eq. 4.11 ). For the first 
bin, the averaging should probably be weighted more to 
the higher multipole moments than to the lower ones in 
the bin because the sensitivity to the spectrum is increas- 
ing rapidly with increasing £. We will see this rapid rise 
in sensitivity to the power spectrum in the next section 
where we plot the Fisher matrix for a finer binning. 

There is very little information in the three highest £ 
bins. Thus, for the final iteration we binned them to- 
gether and plotted the result as the point with the hor- 
izontal error bar. Because of the coarseness of the bins, 
the filter function for the rebinning is coarse and there- 
fore £cs, £^ and £~ are not determined very well. To 
get the filter function more finely, we need to do a finer 
initial binning, which will be done in the next section. 




100 200 300 400 500 

multipole moment i 

FIG. 8. Quadratic estimates of the power in 10 bins, de- 
rived from the SK data. The curve is the zeroth iteration, 
tilted CDM with n = 1.45 and as = 2.16. The squares are 
from left to right, the results of the first to third iterations. 
The data point with the horizontal error bar is a rebinning of 
the top three bins. 
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To investigate the probability distributions beyond the 
mean and the variance, we used our direct likehhood 
evaluation procedure, starting from the final quadratic 
iteration. The results are shown in Fig. |^. The uncer- 
tainties in the first bin are strongly sample- variance dom- 
inated. In the sample-variance limit the fractional vari- 
ance, {SCiY' /Cj, is inversely proportional to the number 
of independent modes contributing to the estimate. Since 
the first bin is not well-determined we can therefore sur- 
mise that only a few modes contribute to it. With so few 
modes we cannot expect the distribution to be Gaussian 
and thus the strong non-Gaussianity for the first band, 
shown in Fig. is not surprising. 
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FIG. 9. Probability Distributions for the power in bands, 
C_B, as labeled, for a prior uniform in Cs. The solid curve 
is the true likelihood from the direct evaluation; the dotted 
curve is the Gaussian approximation from the third iteration 
of the quadratic procedure. 



VI. METHODS: RADICAL COMPRESSION 

As mentioned above, for Gaussian theories, P{/^\Ci) 
contains all the information that is in the map. If the 
probability distribution were Gaussian in C^, then all the 
information in the probability distribution could be com- 
pressed into a mean and a covariance matrix: 



(6.1) 



By the definition of a Gaussian probability distribution, 
this compression involves no loss of information. The 
"lossless" nature of this compression was pointed out by 
Tegmark although here we emphasize that it is only 
true in the Gaussian limit. We refer to compression to 
the power spectrum as "radical compression" because the 
data reduction is impressive: the information in a map 
with N pixels and an N x N noise covariance matrix 
is now held in less than y/N power estimates and their 
\/N X a//V covariance matrix. 



With compression to yN numbers and a covariance 
matrix, analysis of constraints on cosmological parame- 
ters becomes quite rapid. One simply forms the x^- 

w 

(6.2) 



and simply evaluates it to find the minimum and also 
the one sigma and possibly two sigma confidence regions 
of the parameter space. Here, C^({a}) is the calculated 
spectrum for the paramters Op and M^i = {SCiSCi') is 
some appropriately determined correlation matrix, e.g., 
the inverse of the Fisher matrix or the exact curvature 
matrix for the quadratic method, or a likelihood ratio or 
Bayesian determination for the direct evaluation of the 
likelihood. 

Unfortunately, the probability distribution is non- 
Gaussian, as we have seen. One might think that this 
only causes minor inaccuracies to the method of Eq. 3.2. 



In fact, the problems are of a systematic nature and can 
be quite important. To see this we need only examine the 
case of COBE/DMR. Say we wanted to use our power 
spectrum estimates to measure the best fit amplitude of 
standard CDM, expressed as a prediction for erg, by us- 
ing Eq. 3.2. Using our estimates of Cg from the final 
iteration of either the direct or quadratic estimation pro- 
cedures together with the Fisher matrix from the final 
iteration, we find erg = 1.1 instead of the correct value 
of (Tg = 1-2. This example does not mean that non- 
Gaussianity has made radical compression useless, but 
rather that we must proceed with some care. 

The decrease in power is a systematic effect due to 
the skewness of the probability distributions which al- 
low more positive and less negative fluctuations relative 
to a Gaussian distribution with the same variance. An- 
other way of thinking about it is that those amplitudes 
that fluctuate downwards have their variance reduced 
and thus their weight increased while those that fluctuate 
upward have their variance increased and therefore their 
weight decreased. Contrast this to a Gaussian probabil- 
ity distribution for which the curvature is independent of 
location. Thus one can see that the non-Gaussianity of 
the probability distribution can be very important and 
some care must be used in attempting this radical com- 
pression. 

One solution to the problem may be to find a func- 
tion of C£ whose distribution is more Gaussian than that 
of C£ itself. Motivation for one particular form comes 
from considering the sources of the variance. There is 
a sample-variance contribution which is proportional to 
the power and a noise contribution which is independent 
of the power, thus SCi (x Ce + xg for some appropriate Xi 
related to the experimental noise. According to this pro- 
portionality, the probability distribution for In {Ci + xg) 
might be well-approximated by a Gaussian since its vari- 
ance is independent oiCg. This procedure is under inves- 
tigation H. 



12 



It is the iterated Fisher matrix that overweights (un- 
derweights) the points that fluctuated downward (up- 
ward); to prevent these fluctuations from affecting the 
Fisher matrix, one can iterate on the parameters of a 
smooth function, instead of the amphtudes in fine bins, 
and then use the resulting Fisher matrix for the covari- 
ancc matrix associated with the power estimates in bins. 
This is the method of solution we have adopted here. 

We emphasize that the problems we are discussing are 
not peculiar to the use of the quadratic estimator, but are 
associated with the attempt to compress the probability 
distribution of Ci into a mean and covariance matrix. 
Because of non-Gaussianity, this procedure is necessarily 
approximate. The above being said, we will now assume 
Gaussianity, but always use the Fisher matrix derived 
from a smooth theory curve, and not one derived from a 
bin by bin iteration. 

A more benign problem than non-Gaussianity is the 
existence of correlated uncertainties. Although not a 



Ci/Cf^^^'^Zex, the filter function is / 
Thus 



problem for the x of Eq. 6.2, the correlations do compfi- 



cate direct visual interpretation. We may remove these 
correlations by a linear transformation on the parameter 
space, q q = Zq, where Z diagonalizes the parame- 
ter covariance matrix, ZF~^Z"'" = diag (or, equivalently, 
Z^^ diagonalizes the Fisher matrix, F^^^). 

While having the advantage of uncorrelated uncertain- 
ties, the interpretation of these new parameters them- 
selves has been complicated by the transformation. For- 
tunately, there are transformations with some very use- 
ful properties. Many transformations can lead to inde- 
pendent modes. Hamilton |22| made a lengthy study of 
the different possible diagonalizing transformations and 
found one that has a small i?-space width and is positive- 
definite. His transformation translates to Z — where 
L is the Cholesky factorization oi F; F — LL^ . Another 
useful transformation is given by setting Z = F^/^ the 
Hermitian square root of F ||2^ . Parameter eigenmodes 
involving rotation only to the orthogonal combina- 
tions of bandpowers, rather than scale transformations as 
well, are of great interest and emphasize another point: 
when linear combinations of modes are being taken, we 
have freedom in exactly what the scaling will be. It is ob- 
vious though that if we are representing a bandpower at 
a representative £, we want to ensure that the normaliza- 
tion makes sense since the goal is direct visual comparison 
with theoretical Ce curves. 

In the Gaussian approximation, these linear combina- 
tions are independent. Thus we can now estimate each qj 
independently of the qi for i ^ j. The £mQa;-dimensional 
space has been reduced to imax one-dimensional spaces. 
In fact, for each qj, there is no need to keep the Gaussian 
approximation; one can calculate its complete distribu- 
tion function. If the Fisher matrix is a good approxima- 
tion to the curvature matrix, then, at least near the peak, 
the total likelihood function can be approximately de- 
composed into a product of these one-dimensional likeli- 
hood functions: C{{qi}) « Yii^iQi)- Since q\ = qiZi\ — 
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l^e-l xe 
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If we wish to rebin these uncorrelated estimates, we 
can do so in a minimum variance manner by performing 
the following sums: 



AC) 



where 



AC) 



Ae/3 



(6.4) 



(6.5) 



and A^A = ^i>- ■ These equations are derived in Ap- 
pendix B. 

For COBE/DMR we used Cholesky decomposition to 

get filter functions, f^g^ = Lix/Cf^^^'^ ^ and then binned 
together combinations 1-3,4-5,6-7,8-9,10-12,13-16 and 17- 
27. Each of the combinations' filter functions is shown 
in Fig. ^ together with the power estimates. To avoid 
the systematic underestimate of power discussed above 
we used the CiS from our final iteration, but the Fisher 
matrix from the zeroth iteration. This does not imply 
that the errors are completely unaffected by the data. We 
are using standard CDM as our zeroth iteration precisely 
because it is a good fit to the data. 
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FIG. 10. DMR with ortho-rebinning. 

In order to make accurate filter functions for SK we 
divided it up into 26 bins. Starting from the results of 
our third iteration on the 10 bands of the previous sec- 
tion we estimated the power in these 26 bands with a 
single iteration. The fractional uncertainty in most of 
these bands was greater than unity. In order to make 
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the rebinned statistically orthogonal linear combinations 
plotted in Fig. |l^ we used the n = 1.45, as = 2.16 tilted 
standard CDM Fisher matrix. These are the parame- 
ter values for tilted CDM that maximize the likelihood 
function. 



place of the noise- weighted one. Eq. 6.7 is applied to real- 
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FIG. 11. SK with ortho-rebinning. 

At high £, some of our 26 bands still have significant 
width; their "sub-band structure" may be important. To 
estimate the structure of the filter functions within each 
band, we employ an analytic approximation to the Fisher 
matrix. For a map of the sky with uniform weight per 
solid angle, w, covering a fraction of the sky, /sky, we 
know the Fisher matrix is such that ||l|-||: 



(C) ^, (2£+l)/sky 



Ci + 



1) 



2ttwB^{£) 



(6.6) 



where, for a Gaussian beam, B{£) = e^^ "^i, /2. 

An approximation appropriate for difference ex- 
periments rather than maps is to replace wB'^{£) 

(N) 

with the noise- weighted window function wWg = 
Tt{C^^'W i)jN , where is the window function ma- 
trix of Eq. 2.S. In this uniform case, we also have 



Aq) ^ (2^ + l)/sky 
JBi — o 



(1 + STt) 



'£{£+!) 



(6.7) 



The quantity exi is a measure of the mean square of the 
signal-to-noise ratio in modes £. The STe/i^ + STe) factor 
which appears in the square is the Wiener filter (i.e., the 
optimal signal-to-noise filter). In the Eti ^ 1 limit of 
signal-dominance, fgj {£ + l/2)/sky, half the number 
of £ modes available. This is of course a general result for 
the signal-dominated regime, requiring no assumption of 
homogeneous noise. It is often a reasonable approxima- 
tion to use the usual filter function We = Tr(W^)/A^ in 



izations of power spectra for future balloon and satellite 
experiments in § VII. 



The weight map for COBE/DMR varies gently with 
spatial scale outside of the galactic cut, so we expect 
the analytic approximation eq. |6.6| to be reasonably good 
for it and we see in Fig. |l^ that this is so. The two 
curves with a peak at £ = 10 are sums over the exact 
and analytic Fisher matrices for standard CDM. For the 
analytic form we took /sky = 0.65, — 9.5 x 10"^'^ 
(equivalent to an rms noise of 22 /iK on 7° pixels) and 
the appropriate beam shape. 
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FIG. 12. The Fisher matrix sum, F^'^' for the zeroth 
iteration of DMR for exact (solid) and analytic (dashed). For 
comparison, We/£ (dotted) is also shown. 

For the SK data, the comparison of the 26 band exact 
Fisher matrix and the analytic Fisher matrix approxi- 
mation shows some interesting differences (Fig. |l^). The 
analytic curve is for /sky = 0.005, = 3.3 x 10~^^ 

and 6'fwhin = 0.5°. The deficit at smaller £ is presumably 
due to the differencing schemes that were necessary to 
filter out atmospheric contamination. These are partly 



encoded in the noise-weighted W 
only the beam, B'^{£), was used, as in Eq. |6^ 



but fo r th e plot 
This 



deficit bears on the question of what quality of map it is 
possible to create from the SK dataset; the loss of low £ 
information implies that there will be long-distance noise 
correlations in any map made from the data pH . 

In Fig. 14 we show some rows of the normalized pa- 
rameter covariance matrix Mbb' /\/MbbMb'B', where 
Mbb' = F~g,. The correlations for bin 11 {£ = 120- 
132) extend well beyond the 5£ ~ 7r/0 expected for a 
map — again, this is presumably due to the differencing 
schemes. 
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Gaussian approximation) simplifies Eq. 3.2 some, the ex- 
istence of tlie filter functions complicates it: 
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FIG. 13. The Fisher matrix sum, F^^, for the zeroth 
iteration on 26 band SK for exact and analytic. 
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FIG. 14. Slices of the SK 26 band covariance matrix at 
bands 11, 16 and 21, normalized to unity along the diagonal. 

Returning to Fig. ^ we see that the agreement at least 
at higher £, is good. We consider it to be good enough 
to encourage the use of the analytic form for doing some 
sub-band shaping of the filter functions. To be more 
precise about the procedure, within each band, B, we 
give fj^/ the shape fuiCi with the amplitude chosen 

so that /^'^ — /^^^ . We have applied this shaping to 
the five highest I bins in Fig. |l^. 

One might wonder why the analytic curve in Fig. ^ 
has no peak, corresponding to where sample variance and 
noise are equal contributors to the uncertainty in Ci. The 
absence of the peak is due to the rise in Ct from £ = 20 

to ^ = 200. If we plotted Cf X]^' ^u'' ■• which is related to 
the fractional uncertainty in Ci, then there would be a 
peak near £ = 80. 

While the independence of the power estimates (in the 



1 



Csiia}) - Cb 



(6.8) 



4.7 



where CB({a}) is calculated using the filters in Eq 
We have cast this equation in an intuitive form involv- 
ing the deviation of a measured bandpower Cb from the 
predicted spectrum. This is exactly the appropriate 
to g-space, which emphasizes relative deviations of both 
the data and the theoretical predictions from the fiducial 
spectrum used to calculate the quadratic estimator; i.e., 
the details of how one goes from the estimates of to 
the appropriate bandpower drop out of the x^- 

One might argue that the complication of the covari- 
ance matrix has been traded for the complication of the 
filter functions and there has been no net improvement. 
However, we think that, when binning has been done, the 
use of the orthogonal linear combinations improves, or at 
least simplifies, the process of radical compression. Once 
binning has occurred, one wants to know what the filter 
looks like across the bin. Thus binning implies the use of 
filters and once filters arc being used, the orthogonal lin- 
ear combination approach of providing uncorrelated data 
and filters is simpler than providing correlated data with 
filters and a covariance matrix. 

Experiments typically report broad-band power spec- 
trum estimates, together with the trace of their window 
function, Wi, which can be used to make a filter function, 
fe = Wi/£. These power spectrum estimates have indeed 
been used to constrain cosmological parameters, e.g. p2| . 
Using fe = Wi/£ as the filter function is in general not 
the optimal procedure. Only if Ctw is a multiple of the 
identity matrix is We/i the minimum- variance filter. In 
general, the Fisher matrix-derived filters should be used. 
And they can be quite different; in Fig. |l2|one can see the 
tremendous difference between We/( and the minimum 
variance filter, fg. In the noise-dominated regime (high £ 
for DMR), We/e cx £-^B^{e) whereas fi cx i'^B'^ie). 

In our power spectrum plots we have not included cal- 
ibration uncertainty which is ~ 6% for SK and negligi- 
ble for COBE/DMR. The calibration uncertainty is com- 
pletely correlated across the bands. It can be taken into 
account as a nuisance parameter to be added to the 
expression above ||l2|. Other methods for taking it into 
account are discussed in Hi. 



VII. FORECASTING POWER SPECTRA FOR 
FUTURE EXPERIMENTS 

In this section, we exercise our methods on an instruc- 
tive simple case, homogeneous noise over regular patches 
covering a fraction /sky of the sky. We apply the relations 
to simulating realizations of power spectra and their er- 
ror bars for two planned balloon experiments, MAXIMA 
and TOPHAT, and two satellite experiments, MAP and 
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PLANCK. The results are shown in the famihar Ct space 
in Fig. ^ and in ACi/Ci space in Fig. In this q- 
space, which we beheve wih become more and more uti- 
Uzed as the CMB data starts to converge on a specific 
shape, we compare the (converged) quadratic power es- 
timator values and their error bars with the fractional 
deviation, AC^/C^, of a Ct whose parameters we are test- 
ing from a fiducial shape. Here the shape that entered 
into the power spectrum analysis was a standard COBE- 
normalized cold dark matter model C^, and the model 
used to construct the power spectrum realization was also 
this SCDM one. 

For /sky — 1, power spectrum analysis simplifies con- 
siderably if the weight matrix is diagonal in the 
spherical harmonics basis, since then the Ct and Ct,p 
matrices are. Both the Fisher and curvature matrices are 
also diagonal as long as the bands B do not overlap in t- 
space. For /sky < 1, another simple limiting case involves 
rectangular regions of size NxWpix x NyWpix , consisting of 
square pixels of size Wpix x Wpix ■ The S/N eigenmodes are 
then discrete Fourier components, labelled by a wavevec- 
tor Q, which, to a high degree of accuracy, diagonalize 
Ct and Ct,p, and, by assumption, C^^. The number 
of modes of a given |Q| available in a (i|Q| = 1 band is 
(7V,iVj,tu2„/(2^))|Q|; i.e., /sky2|Q|. Using |Q| -i+l 
which follows from relating an expansion in these modes 
to an expansion in spherical harmonics at high i pO| , the 
number of modes is {2£ + l)/sky, as in the all-sky case. 

The Fisher matrix and the quadratic q-estimator are 
given by 
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g, = (2£ + l)/sky . 



The signal-to-noise factor eti is related to the average 
weight w, the noise- weighted filter function Wi, and Ci 
by Eq. 6.7; and the expression for f)^J is a repeat of 



Eq. 6.7. It is also straightforward to modify Sti to take 



into account the noise in multifrequency experiments, in- 
cluding the expected beam size variation with frequency 
channel 

The combination (1 + e^"°)p^ is the average power in 
the modes with given where 6^"° is the true value of 
the power spectrum, and 



2 -1 \ ^ 

Pt = 9 1 2^ 

/i£ {.£— modes} 



GRD 



where GRD^^j is a Gaussian random deviate for the mode 
of given i labelled by a degeneracy variable /i (the az- 
imuthal quantum number, m, in the spherical harmonic 
case, a discrete angle index in the rectangular patch 
case); individual realizations of this variable are due to 



sample and/or cosmic variance. Therefore, gip^ is dis- 
tributed like with gi degrees of freedom, i.e., with a 
cumulative probability given by an incomplete Gamma 
function with arguments 5^/2 and p^/2. Numerical real- 
izations can be done very quickly. 

The factor denotes an approximate value for the 
signal-to-noise power spectrum. As we have discussed, 
within the band we adopt an assumed shape but allow 



the amplitude to vary. In the iterative scheme, e. 



(*) _ 



elj^j would be the value on iteration 



and e 



Tl 
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(n+1) _ 



77- , c^-Li»a 

(1 -I- 5Qb)4} 'would be the value to be inserted for the 
next iteration. 

Note that Qs is the weighted average of the quadratics 
in sub-bands of width unity, Qi, with weig \Af^^}. There- 
fore, the classic optimal signal-to-noise filter, the Wiener 
filter, £t£/(1 + £Ti) in this case, enters in a fundamental 
way into the power spectrum estimation procedure. 

In the signal-dominated region, Eti S> 1, the weighting 



is just by number of modes, 2/ 



(9) 
Bl 



Thus F does 



not change, and 5Qb converges after one iteration. The 
Cf, error bars change because the {CT) g is multiplied by 
the converged (1 + 5Qb)- In the fine-grained case, where 
B encompasses just one the f^^J weights in the vb 
numerator and the Fisher denominator cancel, leaving 
(1 4- £ti'^)p1 for n > 1 even in the noise- 



-(») _ 



1 

domimated regime. 

We adopt improved specifications especially in beam 
size for MAP ||| and PLANCK [|| over the original 
proposal values; these are likely to evolve for Planck. Of 
the 5 HEMT channels for MAP, we assume the 3 highest 
frequency channels, at 40, 60 and 90 GHz, will be dom- 
inated by the primary cosmological signal (with 30 and 
22 GHz channels partly contaminated by bremsstrahlung 
and synchrotron emission) . MAP also assumes 2 years of 
observing. For Planck, 14 months of observing and cur- 
rent (proposal-modified) values are used. The HEMT- 
based LFI specifications are significantly improved; the 
100, 65, 44 GHz channels, but not the 30 GHz chan- 
nel, were used. For the bolometer-based HFI, 100, 150, 
220 and 350 GHz were used. Dust-contamination will 
certainly affect the 550 and 850 GHz channels. For 
both, it was assumed that 65% of the sky would be use- 
ful. MAP has = 0.8 x lO'^^ and PLANCK has 
w-i = 3.3 X 10-18. 

The balloon forecasts used conservative numbers for 
the bolometer-based TOPHAT [|| and MAXIMA |3|] 
experiments that take account of excess noise associated 
with foreground removal. It was assumed that 65% of the 
region covered by TOPHAT would be useable for CMB 
analysis (/sky = 0.028). The beam is 20' and = 1.5 x 
10""'^^ was chosen. (These noise values are for roughly a 
10 day mission.) MAXIMA has a 12' beam, and /sky = 
0.01, = 0.9 X 10~i^ were chosen. 

Other long duration balloon (LDB) bolometer exper- 
iments such as Boomerang 36 1 should be able to do as 
weh. HEMT-based LDB experiments, such as BEAST 
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p7[ using 40 GHz HEMTs, might also achieve similar 
accuracy. A sharp lower ^-cut was included to treat the 
limited sky coverage for TOPHAT {icut = 12) and MAX- 
IMA {icut — 20); we allowed one mode per £ above this 
until (2i'-|- l)/sky exceeded unity, at which point the num- 
ber of modes was given by the integer part of (2^+l)/sky 
An uncertain part of this approach is the treatment of 
modes of order the size of the patch. 

In Fig. |l^, we have tested various pr escrip tions for 
placing the power and the £ value 
ommended using f'j^^ - 
also be defended; e.g. 



In § IVA 



''^l /Ci, but other schemes can 
weighting by the power in the 
modes, so the numerator averages [£{£+ l)\~^Ci wrt fgj 
and the denominator averages [£{£+ 1)]~^- For a steeply 
falling spectrum, the former places the error bar at high 
£, with power weighting it is placed at slightly lower £. 
In all cases, f^^J is essential to include, but, apart from 
this, the main lesson we have learned is that otherwise 
the prescription does not matter very much. 

The decision on the number and placement of bands 
has also been explored. We prefer using a combination 
of conditions to determine the spacing: when the S /N es- 
timate VB /C^VPbb) exceeds some threshold, or if A\n£ 
across the band reaches some precribed value, then a new 
band is made. If we only used logarithmic spacing, then 
there would be too many bands at lower £ with poorly de- 
termined bandpowers for TOPHAT and MAXIMA. For 
the figures, we chose a S/N minimum of 25, translating to 

1 /2 

a 20% fractional error on ; we also chose A In^ = 0.1. 
Clearly, because of the all-sky nature of MAP and Planck, 
the bands are mostly determined by the logarithmic cri- 
terion. This is only true at the higher £ (but before the 
beam kicks in) for the balloon experiments. 

One of the nice features of the homogeneous sky sim- 
plicity is that we can easily test what different prescrip- 
tions and weightings will do. For example, we have ex- 
plored other ways of finding the maximum and estimating 
the errors. The nonlinear maximum likelihood estimator 
uses the curvature matrix: 
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In the fine-grained case, the amplitude adjusts until 
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so Tbb ^ Fbb and the two power 



spectrum estimates and their error bars are the same. 
This form, though, takes longer to converge than the 
quadratic, and when the deviations are too large the iter- 
ation may not converge. (This is typical for the Newton- 
Raphson meth od.) A comparison of Tbb in Eq. 7^ and 
vb in Eq. 7.1 shows that, for wider bands, we can ex- 



pect plus and minus fluctuations over the band which 
give Vb = 0, but, because of the different weighting, will 
not give Tbb = Fbb- 

For the quadratic operator, another measure of the 
error bars is the variance of the Q b , and this can partly 



take the non-Gaussian spread of the probability function 
for the quadratic into account. For the case considered 
here, this variance is diagonal in B. When the ensemble 
average is taken, the result is 
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Thus in the limit that e 
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re 



approaches e^"''. 



(7.3) 



it reduces to 

F^Q, the Fisher error we quote. However, the p\ correc- 
tions inherent in any realization preclude convergence to 
F^^, in such a way as to increase the error bars for low 
power modes and lowering them for high power modes 
over what F^^ gives. 
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FIG. 15. Comparison of forecasts for the two balloon ex- 
periments, TOPHAT and MAXIMA, with the satellite ex- 
periments MAP and Planck. Bands are required to have a 
signal-to-noise of at least 25 and a minimum spacing in £ 
defined by the logarithmic spacing Aln£ = 0.1. With this 
signal-to-noise binning, the growth in the number of bands 
shows the increasing precision and sky coverage of the experi- 
ments. The error bars are those appropriate to the quadratic 
estimator after convergence. 
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FIG. 16. The forecasted data with error bars are shown in 
{q — ACe/Ct)-spa,ce, in which the relevant comparison with 
the data is the fractional difference between the Ce we are 
testing and C|'"'*"^. A few differences are shown for each case 
by solid lines. They are deviations in single parameters, as 
marked, from the shape theory (q = 0), in this case a stan- 
dard COBE-normalized CDM model with D,b = 0.05. The 
theoretical curves can have their amplitudes adjusted up or 
down to best fit the simulated data. 



VIII. DISCUSSION 

A. Computer Resource Demand 

Evaluating the likelihood function is an 0{N^) oper- 
ation. Although both matrix inversion and determinant 
calculation are 0{N^) operations, it is only the deter- 
minant evaluation that prevents the likelihood analy- 
sis from being 0{N'^). That is because only C~^A is 
needed in the evaluation, not the full inverse, and 
this can be potentially be calculated via 0(7V^) itera- 
tive techniques. Today, a single evaluation of the likeli- 
hood function (and it must be evaluated many times to 
search the parameter space; see Appendix ^) takes ap- 
proximately 45 minutes for the N = 2928 SK dataset on 
a DEC Alpha 250/ev5 and roughly a factor of five less on 
a Cray J90 parallel supercomputer; compressing to 1200 
eigenmodes takes only five minutes on the DEC includ- 
ing overhead from the compression process. Upcoming 
balloon datasets are expected to have at least an order 
of magnitude more data — which translates to a factor of 
1000 in execution time (and 100 in storage requirements). 



Megapixel datasets foreseen for upcoming satellite mis- 
sions are clearly too large to analyze in this way with any 
foreseeable increase in computer speed. 

The quadratic estimator is also 0{N^) despite claims 
that it is 0{N'^) 0. Finding good approximations that 
will reduce it to 0{N'^) is an unsolved problem, crucial 
for further study. 

Even as we have implemented it, the quadratic is much 
faster than direct evaluation of the likelihood function. 
Starting from the signal-to-noise basis, one iteration of 
the quadratic estimator for the 10 SK bands of § VI took 



250 seconds to calculate the window function rotated into 
that basis, and 180 seconds to form the Fisher matrix 
and calculate the quadratic estimator on the DEC Al- 
pha, compressing to 1200 modes. The direct evaluation 
method, in contrast, requires a new rotation to the signal- 
to-noise basis at each band, which is roughly 5 minutes 
per band, using the same 1200-mode compression. 

We have also performed the quadratic calculation via 
direct evaluation of 5ap and the Fisher matrix in the 
pixel basis, calculating quantities like C~^C't,p using the 
Cholesky decomposition of C. This is somewhat faster 
than the same calculation in the eigenmode basis, al- 
though it does not allow easy implementation of signal- 
to-noise compression. 

In Appendix ^ we explicitly calculate the Fisher ma- 
trix in 0{N^) operations (the signal-to-noise eigenmode 
decomposition). To see what makes the quadratic esti- 
mator an 0{N^) operation in general, it helps to rewrite 
it. If we define 



(8.1) 



then {up) = Tr(C ^Ct,p) and we can rewrite the 
quadratic estimator as 



(8.2) 



We can iteratively solve for the vector C~^A and there- 
fore yp can be calculated. The slowest parts of the 
quadratic are the Fisher matrix and (yp) — both of which 
require calculating C~^Ct.p- 

If we can find a good approximation to C^^C't,p that 
can be calculated in 0{N^) operations, then the entire es- 
timation procedure will be 0{N^) for each element of the 
Fisher matrix. Since the Fisher matrix has Np elements, 
the estimation procedure is 0{N'^Np). If the number of 
parameters is roughly the square root of the number of 
pixels (as is expected to be the case for power spectrum 
estimation) then the estimation procedure is 0{N^). For 
the largest maps, we can take advantage of the sparseness 
of the Fisher matrix to only calculate it in a band around 
the diagonal, reducing the process to 0{N'^^^). [If we skip 
the power spectrum and go straight to the estimation of 
cosmological parameters, then Np <C N and the process 
is 0{N^). Of course, if is calculated directly (an 
0{N^) operation), then this is the most intensive step in 
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calculating the Fisher matrix, and the whole process is 
still 0{N^).] 

The method we outline is completely general, allow- 
ing arbitrary chopping strategies and ofF-diagonal noise 
correlations, including those generated by the subtrac- 
tion of constraints or foreground templates, as explained 
in § H and Appendix We expect that these noise 
correlations will become increasingly important in future 
balloon and satellite experiments, which will exhibit both 
1/f streaking and significant foreground contamination. 
Although wc hope to find techniques that will reduce 
the computational load from 0{N^) to 0{N'^), with gen- 
eral inhomogeneous noise this is a difficult problem. One 
approach is to try to find the best possible approxima- 
tion to the generalized noise matrix which allows fast 
computation, then treat the residual perturbatively. An- 
other is to rely on the special nature of the noise for a 
given experiment. For example, if an approximate set of 
eigenmodes along with their projection onto the spherical 
harmonics is known for the geometry and weighting of a 
particular dataset, then quantities like C~^Ct,p can be 
calculated without explicit inversion or matrix manipula- 
tion. Gorski's cut-sky spherical harmonics |^ have this 
property, but require an 0{N^) Cholesky decomposition 
for their construction. 

For mapping experiments, the parameter derivatives 
Ct/ will be proportional to the Legendre polynomials, 
which can in turn be written as a sum over spherical har- 
monics using the appropriate summation formula. We 
have shown that at high two-dimensional (flat-sky) 
fourier modes with wavenumber |Q| ~ ^ are very useful, 
and expect that they will be effective as we look for ways 
to improve the computational speed. For COBE/DMR, 
using an approximate weight is adequate for some statis- 
tical measures, but for high precision work the residual 
60° correlation and constraints should be taken into ac- 
count. For upcoming balloon and satellite experiments, 
full and correct modelling of the noise and its behaviour 
in various subspaces will be essential for achieving the 
forecasted accuracy [^jj^ in cosmological parameter de- 
terminations. 



B. Redshift Surveys 

So far, we have concentrated our analysis on appli- 
cations to CMB anisotropy data. However, much of it 
can be carried over to estimate the power spectrum of 
other sorts of data, particularly that of upcoming red- 
shift surveys ||3|,|l^ . In that case, we partition the three- 
dimensional volume probed by the surveys into bins i = 
1 . . . N and use counts- in-cells as the data Ai = Si + rii. 
Now, the "beam function" becomes the selection func- 
tion of the survey restricted to the individual bins, which 
accounts for the flux cutoff in its observational bands. 
The noise becomes considerably more complicated: it is 
the "shot noise" which comes from the sampling of the 



underlying density field in whose correlations we are ac- 
tually interested. This noise is not Gaussian, but Poisso- 
nian (and only that if we ignore correlations within the 
bin); to use this formalism requires that we have enough 
galaxies per bin that a Gaussian approximation is ade- 
quate, but small enough that the correlations within the 
bin are ignorable (and small enough that we still have in- 
formation on scales of interest). In that case, the Poisson 
noise term has (nf) given by the counts in the bin. Of 
course, there are further complications due to redshift- 
space distortions. For an alternative to this procedure, 
see |9|. 

C. Summary 

We have demonstrated two techniques for determining 
the power spectrum of CMB fluctuations from realistic 
microwave data. We have presented an analysis of both 
a direct likelihood search and a specific quadratic estima- 
tor; the most important result of this paper is the proof 
that the iterated application of the quadratic estimator 
is a fast method for finding the peak and curvature of the 
likelihood function. 

Our methods easily incorporate such realistic features 
as convoluted chopping strategies, incomplete sky cover- 
age, and the removal of linear constraints from the data. 
As implemented today, our method requires 0{N^) op- 
erations in order to deal with these complications. We 
have applied the techniques to both the DMR and SK 
datasets, which exhibit all of these complications. Nu- 
merically, our results agree quite well with other analyses 
of these datasets. 

We have also discussed several caveats in the further 
use of the power spectrum, associated with the non- 
Gaussian nature of the posterior distribition of the Ci. 
This can have repercussions in any analysis (such as x^, 
or even in our own rebinning techniques) which implicitly 
or explicitly assume Gaussianity of the distribution (i.e., 
the constant curvature of the log-likelihood). 

The traditional procedure for reporting constraints on 
the power spectrum is the band-power method, where the 
power spectrum estimate is considered to be a measure- 
ment of the power averaged through some specific filter. 
In the past this filter has been given by the trace of the 
window function, Wg. We advocate a generalization of 
this procedure where the filter is derived from the Fisher 
matrix instead. With this better definition of the filter, 
the new technique will improve the accuracy of analyses 
that start from band-power estimates. 

D. Quadratic Estimation Cookbook 

We now summarize the complete algorithm for 
quadratic power spectrum estimation: 
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1 . Obtain the data and the error or weight matrix C n 
(including the effects of constraints as discussed in 
Appendix ^). 



2. Choose an initial £ binning, as discussed in § IV 



3. Calculate the window function matrix Wppi{£), 



Eq. 2.8, perhaps averaged over the £ bins. 



4. Choose a power spectrum C^^"* to begin the itera- 
tion. 

5. Calculate Ct for C^*-* (i = for the first iteration), 
from Eq. 2.1 



6. If desired, the rest of the calculation can be per- 
formed in the Signal-to- Noise basis of Appendix 0. 
In that case, Ct and the data are transformed ac- 



cording to Eqs. A4-A5 



7. Calculate the parameter derivati ves Ct.b = 
dCr/dqE in each ban d, u sing Eqs. 2.19 - 2.2C or, 
in the S/N basis, Eqs. A7-A8. The parameter qs, 
Eq. 4.10| , is the fractional difference from cf^. 



8. Calculate the Fisher Matrix, Eq. or Eq. |A1C 
for the chosen bands. 



9. Calculate the com plete quadratic 5qB using 
Eq. |2.18| or Eq. |All| , and set 



10. Lather, rinse, and repeat with Step 5 until 5qB ~ 
to the desired accuracy. 

This description has not included the complications 
associated with rebinning (see § [V ) and the use of filter 
functions for reporting bandpowers (see § VI). 



E. Numerical Results 

Our power spectrum estimates for COBE/DMR and 
SK are available over the WWW and by anonymous 
FTP in the directory f ile : //f tp . cita.utoronto . ca/ 
cita/knox/pspec_Cl/. These numerical results include 
the results of both the fuU-likclihood and quadratic pro- 
cedures; for the latter we include the results for "orthog- 
onalized" and "shaped" bands, along with appropriately 
tabulated filter functions. 
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APPENDIX A: SIGNAL-TO-NOISE 
EIGENMODES & CONSTRAINTS 

Some of the calculations described in this paper 
are performed in the "signal-to-noise eigenmode" basis 
|p^ , p^| -p^. To effect this transformation, we model the 
observation at a pixel as 



Si + n. 



(Al) 



where Si is the contribution to the signal, and to the 
noise. They have zero means, and independent corre- 
lation matrixes {niUii) = CnW and {siSi') = ct^j^Ctm'. 
Here, trth is the unknown amplitude of the signal to be 
measured (along with other possible parameters in Ct)- 
We may ascribe more than the experimental noise con- 
tribution to n,;: in particular, any contributions to the 
observation with which we are not concerned in a given 
part of the calculation can be included in the noise. This 
could be the CMB monopole and dipole, or constraints 
such as averages and gradients that may have been re- 
moved from the data to compensate for atmospheric and 
instrumental drift. For COBE/DMR, we allow arbitrary 
amplitudes for the monopole (one component) and the 
dipole (three components); for SK, we allow an arbitrary 
average for each "demodulation" , giving a total of 66 
separate amplitudes. In the event, each constraint com- 
ponent c can be represented by a template in pixel space, 
Tci, with an unknown amplitude, Kc- Thus, the CMB sig- 
nal plus experimental noise is given by the combination 
A,; — J2c '^c^cii which is distributed as a Gaussian with 
correlation matrix C„ -I- ct^j^Ct. We do not know the am- 
plitudes Kc a priori^ but we can assign them a prior prob- 
ability distribution given by a zero-mean Gaussian with 
very large variances in the matrix (kc^c') = Kcd, (com- 
pared to the expected signal and the experimental noise) , 
and then marginalize over the amplitudes Kc- It turns out 
that this marginalization procedure can be done analyt- 
ically, and the result is that the likelihood is now given 
by a zero-mean Gaussian distribution in A alone, with a 
full correlation matrix including a new term accounting 
for the unknown constraints: 



where 



(AiAj/) — a^Y^CTii' + CnW + Ccii' (A2) 



Ccii' = TciKcc'^c'i' (A3) 



is the constraint or template correlation matrix. For a 
diagonal matrix of priors, K = diag(cr^), this reduces to 

Ccii' = O'r'^ ci'^ ci' ■ 
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In effect, we have added a new term to the noise corre- 
lation, Cm = Cn + C'c', in the foUowing we shall implicitly 
include this in Cat. In the limit ^ oo, this procedure 
is equivalent to projecting out the constrained compo- 
nents from the data and the correlation matrix; because 
this projection results in a singular matrix, the marginal- 
ization procedure is numerically simpler (but see p9[ | for 
the details of an implementation of the projection pro- 
cedure). Note also that this procedure is more generally 
useful: in particular it provides a new technique for re- 
moving foreground contamination with a known spatial 
morphology ]4C| ]. 

With this split of the observation into signal and (gen- 
eralized) noise, we first perform a so-called whitening 
transformation 





^—1/2^ ^—1/2 J 






1 / 2 y y 1/2 




A- 




(A4) 



— 1/2 

Here, is the inverse of the Cholesky decomposition 

of Cat or its Hermitian Square Root. Now, the noise part 

— 1/2 

of the "new data," C^y ' A, are uncorrelated, with unit 
variance. We next diagonalize the signal matrix with its 
matrix of eigenvectors. 



i?tC^i/2CTC-^/'i? = £ = diag(£fe); 



A ^ i?tc^i/2A = e 



(A5) 



In this new basis, the data are uncorrelated with vari- 
ance (^^) = l-|-cr^jj£fe. The £fe are "eigenmodes of signal- 
to-noise"; modes with large eigenvalue are expected to 
be well-measured (for the specific theory matrix Ct used 
in the transformation); modes with small eigenvalue are 
poorly-measured (and do not contribute significantly to 
the likelihood). In particular, we use this transformation 
to compress the SK data: we pick a fiducial model (in this 
case, Us = 1.45 tilted standard CDM, which fits the SK 
data alone reasonably well) and calculate the modes for 
this theory. We then discard all but the top 1200 modes 
(of 2928 data points) and treat this linear combination 
as our new dataset (for which we subsequently calcu- 
late all likelihoods without further approximation); else- 
where ||l^ we show that this truncation to 1200 theory- 
dependent modes is an excellent approximation to the 
entire dataset. 

Note that in the S/N basis, the likelihood as a func- 
tion of the amplitude dth is quite easy to compute for 
arbitrary values: 



21nP(A|at\C,) 



E 



ln(l 



(A6) 



(up to a constant). In the calculation of the likelihood as 
a function of the values of the power spectrum, we iterate 



by ascribing only the single Ci (or within a band, with 
some shape for Ci over the band) of interest to the signal. 
Si, and the rest to the noise, n^, along with the actual ex- 
perimental noise, and any terms due to constraints such 
as dipole removal. This way, the single parameter of in- 
terest at anytime is just the amplitude cr^j^ oc Cg for that 
band, for which the likelihood is easy to compute once 
the S/N mode decomposition has been determined. 

We also compute the quadratic Ci estimators in this 
basis. First, we define the window function matrix 
(Eq. |2.8D transformed into this basis. 



This quantity comes into the calculations because it is 
related to the derivative of the theory covariance in the 
eigenbasis. 



kk'J. 



1/2 \ dCrw f^-i/ 



Gkk'ie). (AS) 



ii' 



Here we have assumed that we are interested in the in- 
dividual Ci values. If we are instead interested in the 
values over some bands, S, of with some assumed spec- 
tral shape Cf^^^'^ , then we use instead 



-kk'.B 



£kk',eC] 



shape 



(A9) 



eeB 



Note that, unlike the full theory covariance, £ — 
diag(£'fc ), th e se de rivatives have off-diagonal components. 
In Eqs. |A10- All below, £kk' ,b and £kk',e can be used in- 



terchangeably, depending on whether one is estimating 
individual Ci values, or those in bands. 

The Fisher matrix for the parameters Cg (Eq. 2.17) 
then becomes 



£kk'.e£k'k.e 



kk' 



{l + al£k)il + al£k')' 



(AlO) 



Then the full quadratic estimator (compare Eq. 2.1q) is 



fefc' 



S,k£kk',e'^k 



[1 + a^^£k){l + al£k') 



St 



kkJ.' 



(All) 



Note that in this formalism the 0{N^) transformation 
into the S /N basis is the most expensive part of the cal- 
culation; the remainder requires trivial 0{N^) sums and 
the inverse of the (comparitively small) Np x Np Fisher 
matrix. 
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APPENDIX B: REBINNING ORTHOGONAL 
LINEAR COMBINATIONS 



Here we derive Eq. |6.4| which tells how to rebin orthog- 
onal linear combinations of . We then generalize to the 
case where the initial binning is coarser than = 1. 

We start by parameterizing the spectrum as 



^shapc 



(Bl) 



and then transforming the qi to q ^ q. If we assume 
the shape is correct, then the expectation value of q^ = 
q\/N\ is independent of A, where N\ = ^^Zg\. Since 
we always want to average things together that we expect 
to be measurements of the same quantity, we average 
together the q^ . Calling the result q^ : 



(B2) 



Here, and in the following, the sums over A and A' extend 
only over the range determined by p. For example, if for 
/3 = 1 we are averaging together the first three linear 
combinations, then the sums over A and A' run from one 
to three. 

~N 

Using the fact that F^y = N\F^yNx' and specializing 
to the case where F^y 



3AA' 



or Z = we get 



~N _ Ea 9aA^a 

9/3 



Ea^^a^ 



Plugging in qx = ZexCe/Cf'''^° and Nx 
algebra shows that the filter function is 



(B3) 



J2i Zix a little 



(B4) 



where 



Im - Zix/C\ 



hape 



is the filter function prior to rcbinning. Therefore 

n Ea 9aVVa T.f. <liCt'''^° fpP 

- AC) 



(C) 

e.J 131 



(B5) 



(B6) 



The second equality follows since when the sum over A 
goes over all A, Z^a-^J^/ = F^gi- 

When the initial binning is coarser than A£ — 1, then 
this procedure is slightly more complicated. We intro- 
duce the sub-band structure filter, /j^/, which is defined 
within each band B. The sub-band structure is given by 



AC) 
J Be 



(C)^shapc 



(B8) 



which is the same as Eq. i.t . The difference here is that 



we have not calculated F^^, and thus must rely on ana- 
lytic knowledge of it. 

The rebinning procedure is the same for the A£ — 1 
initial binning except 



(C)^shape 



(C) 



Efes fBi 

B 



and 



AC) AC) _ 



AC) 
J Be 



J fj' 



(C) 



AC) I ■'l3'B{e) 
,l^eeB J Be / 



(which is the case ior Z = L -^^here 



AC) _ sr "^Bp 

/3e/3' ^B 



(B9) 
(BIO) 

(BU) 
(B12) 



The final result is 



Y.I <lB{e)C, 



shape p{C) 



l^eJp'i 



l3'BqB- 



(B13) 



The last equality is used to define the matrix Xp'B and 
to emphasize that Cp/ is simply a linear transformation 
of the original qs parameters. The Fisher matrix for 
Cf3' can easily be calculated from that for qs, using the 
general rule for how the Fisher matrix changes under 
linear transformation of the parameters. 



which is Eq. p^ . 

As an aside, we consider the case of rebinning all the 
estimates into one bin. We expect that the estimated 
power and filter function in this case should be indepen- 
dent of the basis of the original estimates; they should 
not depend on Z. Indeed, this is the case: 



//^Shapc r7 ry V"^ rt /^shapC 77i(J 

«'A ^e\^e'\ l^u'^el^i, J'ui 



E«'A ^nZc. 



V F'^ 
l^ee' ^ee' 



(B7) 
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